{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "eb718d24",
   "metadata": {},
   "source": [
    "# Automatic Context Compaction\n",
    "\n",
    "Long-running agentic tasks can often exceed context limits. Tool heavy workflows or long conversations quickly consume the token context window. In [Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents), we discussed how managing context can help avoid performance degradation and context rot.\n",
    "\n",
    "The Claude Agent Python SDK can help manage this context by automatically compressing conversation history when token usage exceeds a configurable threshold, allowing tasks to continue beyond the typical 200k token context limit.\n",
    "\n",
    "In this cookbook, we'll demonstrate context compaction through an **agentic customer service workflow**. Imagine you've built an AI customer service agent tasked with processing a queue of support tickets. For each ticket, you must classify the issue, search the knowledge base, set priority, route to the appropriate team, draft a response, and mark it complete. As you process ticket after ticket, the conversation history fills with classifications, knowledge base searches, and drafted responses—quickly consuming thousands of tokens.\n",
    "\n",
    "## What is Context Compaction?\n",
    "\n",
    "When building agentic workflows with tool use, conversations can grow very large as the agent iterates on complex tasks. The `compaction_control` parameter provides automatic context management by:\n",
    "\n",
    "1. Monitoring token usage per turn in the conversation\n",
    "2. When a threshold is exceeded, injecting a summary prompt as a user turn\n",
    "3. Having the model generate a summary wrapped in `<summary></summary>` tags. These tags aren't parsed, but are there to help guide the model.\n",
    "4. Clearing the conversation history and resuming with only the summary\n",
    "5. Continuing the task with the compressed context\n",
    "\n",
    "## By the end of this cookbook, you'll be able to:\n",
    " \n",
    " - Understand how to effectively manage context limits in iterative workflows\n",
    " - Write agents that leverage automatic context compaction\n",
    " - Design workflows that maintain focus across multiple iterations\n",
    "\n",
    "##  Prerequisites\n",
    "\n",
    "Before following this guide, ensure you have:\n",
    "\n",
    "**Required Knowledge**\n",
    "\n",
    "- Basic understanding of agentic patterns and tool calling\n",
    "\n",
    "**Required Tools**\n",
    "\n",
    "- Python 3.11 or higher\n",
    "- Anthropic API key\n",
    "- Anthropic SDK >= 0.74.1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "004e3b52",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "First, install the required dependencies:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "71d7f8c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install -qU anthropic python-dotenv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ba96da5",
   "metadata": {},
   "source": [
    "Note: Ensure your .env file contains:\n",
    "\n",
    "`ANTHROPIC_API_KEY=your_key_here`\n",
    "\n",
    "Load your environment variables and configure the client. We also load a helper utility to visualize Claude message responses.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "68f6d5bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()\n",
    "\n",
    "MODEL = \"claude-sonnet-4-5\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "783cf8e3",
   "metadata": {},
   "source": [
    "## Setting the Stage\n",
    "\n",
    "In [utils/customer_service_tools.py](utils/customer_service_tools.py), we've defined several functions for processing customer support tickets:\n",
    "\n",
    "- `get_next_ticket()` - Retrieves the next unprocessed ticket from the queue\n",
    "- `classify_ticket(ticket_id, category)` - Categorizes issues as billing, technical, account, product, or shipping\n",
    "- `search_knowledge_base(query)` - Finds relevant help articles and solutions\n",
    "- `set_priority(ticket_id, priority)` - Assigns priority levels (low, medium, high, urgent)\n",
    "- `route_to_team(ticket_id, team)` - Routes tickets to the appropriate support team\n",
    "- `draft_response(ticket_id, response_text)` - Creates customer-facing responses\n",
    "- `mark_complete(ticket_id)` - Finalizes processed tickets\n",
    "\n",
    "For a customer service agent, these tools enable processing tickets systematically. Each ticket requires classification, research, prioritization, routing, and response drafting. When processing 20-30 tickets in sequence, the conversation history fills with tool results from every classification, every knowledge base search, and every drafted response, causing linear token growth.\n",
    "\n",
    "The `beta_tool` decorator is used on the tools to make them accessible to the Claude agent. The decorator extracts the function arguments and docstring and provides these to Claude as tool metadata.\n",
    "\n",
    "```python\n",
    "import anthropic\n",
    "from anthropic import beta_tool\n",
    "\n",
    "@beta_tool\n",
    "def get_next_ticket() -> dict:\n",
    "    \"\"\"Retrieve the next unprocessed support ticket from the queue.\"\"\"\n",
    "    ...\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "58d2a922",
   "metadata": {},
   "outputs": [],
   "source": [
    "import anthropic\n",
    "from utils.customer_service_tools import (\n",
    "    classify_ticket,\n",
    "    draft_response,\n",
    "    get_next_ticket,\n",
    "    initialize_ticket_queue,\n",
    "    mark_complete,\n",
    "    route_to_team,\n",
    "    search_knowledge_base,\n",
    "    set_priority,\n",
    ")\n",
    "\n",
    "client = anthropic.Anthropic()\n",
    "\n",
    "tools = [\n",
    "    get_next_ticket,\n",
    "    classify_ticket,\n",
    "    search_knowledge_base,\n",
    "    set_priority,\n",
    "    route_to_team,\n",
    "    draft_response,\n",
    "    mark_complete,\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77fecfb8",
   "metadata": {},
   "source": [
    "## Baseline: Running Without Compaction\n",
    "\n",
    "Let's start with a realistic customer service scenario: Processing a queue of support tickets. \n",
    "\n",
    "The workflow looks like this:\n",
    "\n",
    "**For Each Ticket:**\n",
    "1. Fetch the ticket using `get_next_ticket()`\n",
    "2. Classify the issue category (billing, technical, account, product, shipping)\n",
    "3. Search the knowledge base for relevant information\n",
    "4. Set appropriate priority (low, medium, high, urgent)\n",
    "5. Route to the correct team\n",
    "6. Draft a customer response\n",
    "7. Mark the ticket complete\n",
    "8. Move to the next ticket\n",
    "\n",
    "**The Challenge**: With 5 tickets in the queue, and each requiring 7 tool calls, Claude will make 35 or more tool calls. The results from each step including classification knowledge base search, and drafted responses accumulate in the conversation history. Without compaction, all this data stays in memory for every ticket, by ticket #5, the context includes complete details from all 4 previous tickets.\n",
    "\n",
    "Let's run this workflow **without compaction** first and observe what happens:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "uef86nvtl4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Turn  1: Input=  1,537 tokens | Output=   57 tokens | Messages= 1 | Cumulative In=   1,537\n",
      "Turn  2: Input=  1,760 tokens | Output=  102 tokens | Messages= 3 | Cumulative In=   3,297\n",
      "Turn  3: Input=  1,905 tokens | Output=   88 tokens | Messages= 5 | Cumulative In=   5,202\n",
      "Turn  4: Input=  2,237 tokens | Output=   84 tokens | Messages= 7 | Cumulative In=   7,439\n",
      "Turn  5: Input=  2,385 tokens | Output=   89 tokens | Messages= 9 | Cumulative In=   9,824\n",
      "Turn  6: Input=  2,537 tokens | Output=  301 tokens | Messages=11 | Cumulative In=  12,361\n",
      "Turn  7: Input=  2,888 tokens | Output=   67 tokens | Messages=13 | Cumulative In=  15,249\n",
      "Turn  8: Input=  3,079 tokens | Output=   56 tokens | Messages=15 | Cumulative In=  18,328\n",
      "Turn  9: Input=  3,316 tokens | Output=   91 tokens | Messages=17 | Cumulative In=  21,644\n",
      "Turn 10: Input=  3,450 tokens | Output=   84 tokens | Messages=19 | Cumulative In=  25,094\n",
      "Turn 11: Input=  3,777 tokens | Output=   84 tokens | Messages=21 | Cumulative In=  28,871\n",
      "Turn 12: Input=  3,925 tokens | Output=   89 tokens | Messages=23 | Cumulative In=  32,796\n",
      "Turn 13: Input=  4,077 tokens | Output=  349 tokens | Messages=25 | Cumulative In=  36,873\n",
      "Turn 14: Input=  4,476 tokens | Output=   67 tokens | Messages=27 | Cumulative In=  41,349\n",
      "Turn 15: Input=  4,668 tokens | Output=   56 tokens | Messages=29 | Cumulative In=  46,017\n",
      "Turn 16: Input=  4,894 tokens | Output=   91 tokens | Messages=31 | Cumulative In=  50,911\n",
      "Turn 17: Input=  5,028 tokens | Output=   84 tokens | Messages=33 | Cumulative In=  55,939\n",
      "Turn 18: Input=  5,333 tokens | Output=   84 tokens | Messages=35 | Cumulative In=  61,272\n",
      "Turn 19: Input=  5,481 tokens | Output=   89 tokens | Messages=37 | Cumulative In=  66,753\n",
      "Turn 20: Input=  5,633 tokens | Output=  334 tokens | Messages=39 | Cumulative In=  72,386\n",
      "Turn 21: Input=  6,017 tokens | Output=   67 tokens | Messages=41 | Cumulative In=  78,403\n",
      "Turn 22: Input=  6,209 tokens | Output=   56 tokens | Messages=43 | Cumulative In=  84,612\n",
      "Turn 23: Input=  6,435 tokens | Output=   91 tokens | Messages=45 | Cumulative In=  91,047\n",
      "Turn 24: Input=  6,569 tokens | Output=   84 tokens | Messages=47 | Cumulative In=  97,616\n",
      "Turn 25: Input=  6,896 tokens | Output=   84 tokens | Messages=49 | Cumulative In= 104,512\n",
      "Turn 26: Input=  7,044 tokens | Output=   89 tokens | Messages=51 | Cumulative In= 111,556\n",
      "Turn 27: Input=  7,196 tokens | Output=  372 tokens | Messages=53 | Cumulative In= 118,752\n",
      "Turn 28: Input=  7,618 tokens | Output=   67 tokens | Messages=55 | Cumulative In= 126,370\n",
      "Turn 29: Input=  7,808 tokens | Output=   56 tokens | Messages=57 | Cumulative In= 134,178\n",
      "Turn 30: Input=  8,040 tokens | Output=   96 tokens | Messages=59 | Cumulative In= 142,218\n",
      "Turn 31: Input=  8,179 tokens | Output=   85 tokens | Messages=61 | Cumulative In= 150,397\n",
      "Turn 32: Input=  8,508 tokens | Output=   84 tokens | Messages=63 | Cumulative In= 158,905\n",
      "Turn 33: Input=  8,656 tokens | Output=   89 tokens | Messages=65 | Cumulative In= 167,561\n",
      "Turn 34: Input=  8,808 tokens | Output=  332 tokens | Messages=67 | Cumulative In= 176,369\n",
      "Turn 35: Input=  9,190 tokens | Output=   67 tokens | Messages=69 | Cumulative In= 185,559\n",
      "Turn 36: Input=  9,382 tokens | Output=   60 tokens | Messages=71 | Cumulative In= 194,941\n",
      "Turn 37: Input=  9,475 tokens | Output=  297 tokens | Messages=73 | Cumulative In= 204,416\n",
      "\n",
      "============================================================\n",
      "BASELINE RESULTS (NO COMPACTION)\n",
      "============================================================\n",
      "Total turns:   37\n",
      "Input tokens:  204,416\n",
      "Output tokens: 4,422\n",
      "Total tokens:  208,838\n",
      "============================================================\n"
     ]
    }
   ],
   "source": [
    "from anthropic.types.beta import BetaMessageParam\n",
    "\n",
    "num_tickets = 5\n",
    "initialize_ticket_queue(num_tickets)\n",
    "\n",
    "messages: list[BetaMessageParam] = [\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": f\"\"\"You are an AI customer service agent. Your task is to process support tickets from a queue.\n",
    "\n",
    "For EACH ticket, you must complete ALL these steps:\n",
    "\n",
    "1. **Fetch ticket**: Call get_next_ticket() to retrieve the next unprocessed ticket\n",
    "2. **Classify**: Call classify_ticket() to categorize the issue (billing/technical/account/product/shipping)\n",
    "3. **Research**: Call search_knowledge_base() to find relevant information for this ticket type\n",
    "4. **Prioritize**: Call set_priority() to assign priority (low/medium/high/urgent) based on severity\n",
    "5. **Route**: Call route_to_team() to assign to the appropriate team\n",
    "6. **Draft**: Call draft_response() to create a helpful customer response using KB information\n",
    "7. **Complete**: Call mark_complete() to finalize this ticket\n",
    "8. **Continue**: Immediately fetch the next ticket and repeat\n",
    "\n",
    "IMPORTANT RULES:\n",
    "- Process tickets ONE AT A TIME in sequence\n",
    "- Complete ALL 7 steps for each ticket before moving to the next\n",
    "- Keep fetching and processing tickets until you get an error that the queue is empty\n",
    "- There are {num_tickets} tickets total - process all of them\n",
    "- Be thorough but efficient\n",
    "\n",
    "Begin by fetching the first ticket.\"\"\",\n",
    "    }\n",
    "]\n",
    "\n",
    "total_input = 0\n",
    "total_output = 0\n",
    "turn_count = 0\n",
    "\n",
    "runner = client.beta.messages.tool_runner(\n",
    "    model=MODEL,\n",
    "    max_tokens=4096,\n",
    "    tools=tools,\n",
    "    messages=messages,\n",
    ")\n",
    "\n",
    "for message in runner:\n",
    "    messages_list = list(runner._params[\"messages\"])\n",
    "    turn_count += 1\n",
    "    total_input += message.usage.input_tokens\n",
    "    total_output += message.usage.output_tokens\n",
    "    print(\n",
    "        f\"Turn {turn_count:2d}: Input={message.usage.input_tokens:7,} tokens | \"\n",
    "        f\"Output={message.usage.output_tokens:5,} tokens | \"\n",
    "        f\"Messages={len(messages_list):2d} | \"\n",
    "        f\"Cumulative In={total_input:8,}\"\n",
    "    )\n",
    "\n",
    "print(f\"\\n{'=' * 60}\")\n",
    "print(\"BASELINE RESULTS (NO COMPACTION)\")\n",
    "print(f\"{'=' * 60}\")\n",
    "print(f\"Total turns:   {turn_count}\")\n",
    "print(f\"Input tokens:  {total_input:,}\")\n",
    "print(f\"Output tokens: {total_output:,}\")\n",
    "print(f\"Total tokens:  {total_input + total_output:,}\")\n",
    "print(f\"{'=' * 60}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dd25c6f",
   "metadata": {},
   "source": [
    "Now that we have our baseline, we have a better picture of how context grows without compaction. As you can see, each turn results in linear token growth, as every turn adds more tokens to the input. \n",
    "\n",
    "This leads to high token consumption and potential context limits being reached quickly. By the 27th turn, we have a cumulative 150,000 input tokens just for 5 tickets.\n",
    "\n",
    "Let's review Claude's final response after processing all 5 tickets without compaction:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d8b51b65",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---\n",
      "\n",
      "## ✅ ALL TICKETS PROCESSED SUCCESSFULLY!\n",
      "\n",
      "**Summary of Completed Work:**\n",
      "\n",
      "I have successfully processed all 5 tickets from the queue. Here's what was accomplished:\n",
      "\n",
      "1. **TICKET-1** - Sam Smith - Payment method update error\n",
      "   - Category: Billing | Priority: High | Team: billing-team\n",
      "   \n",
      "2. **TICKET-2** - Morgan Johnson - Missing delivery\n",
      "   - Category: Shipping | Priority: High | Team: logistics-team\n",
      "   \n",
      "3. **TICKET-3** - Morgan Jones - Email address change request\n",
      "   - Category: Account | Priority: Medium | Team: account-services\n",
      "   \n",
      "4. **TICKET-4** - Alex Johnson - Wrong item delivered\n",
      "   - Category: Shipping | Priority: High | Team: logistics-team\n",
      "   \n",
      "5. **TICKET-5** - Morgan Jones - Refund request for cancelled subscription\n",
      "   - Category: Billing | Priority: High | Team: billing-team\n",
      "\n",
      "Each ticket was:\n",
      "✅ Classified correctly\n",
      "✅ Researched in the knowledge base\n",
      "✅ Assigned appropriate priority\n",
      "✅ Routed to the correct team\n",
      "✅ Given a detailed, helpful customer response\n",
      "✅ Marked as complete\n",
      "\n",
      "The queue is now empty and all tickets have been processed!\n"
     ]
    }
   ],
   "source": [
    "print(message.content[-1].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "klok33ohsvn",
   "metadata": {},
   "source": [
    "### Understanding the Problem\n",
    "\n",
    "In the baseline workflow above, Claude had to:\n",
    "- Process **5 support tickets** sequentially\n",
    "- Complete **7 steps per ticket** (fetch, classify, research, prioritize, route, draft, complete)\n",
    "- Make **35 tool calls** with results accumulating in conversation history\n",
    "- Store **every classification, every knowledge base search, every drafted response** in memory\n",
    "\n",
    "**Why This Happens**:\n",
    "1. **Linear token growth** - With each tool use, the entire conversation history (including all previous tool results) is sent to Claude\n",
    "2. **Context pollution** - Ticket A's classification and drafted response remain in context while processing Ticket B\n",
    "3. **Compounding costs** - By the time you're on Ticket #5, you're sending data from all 4 previous tickets on every API call\n",
    "4. **Slower responses** - Processing massive contexts takes longer\n",
    "5. **Risk of hitting limits** - Eventually you hit the 200k token context window\n",
    "\n",
    "\n",
    "**What We Actually Need**: After completing Ticket A, we only need a **brief summary** (ticket resolved, category, priority) - not the full classification result, knowledge base search, and complete drafted response. The detailed workflow should be discarded, keeping only completion summaries.\n",
    "\n",
    "Let's see how automatic context compaction solves this problem."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "byut5h7hi3",
   "metadata": {},
   "source": [
    "## Enabling Automatic Context Compaction\n",
    "\n",
    "Let's run the exact same customer service workflow, but with automatic context compaction enabled. We simply add the `compaction_control` parameter to our tool runner.\n",
    "\n",
    "The `compaction_control` parameter has one required field and several optional ones:\n",
    "\n",
    "- **`enabled`** (required): Boolean to turn compaction on/off\n",
    "- **`context_token_threshold`** (optional): Token count that triggers compaction (default: 100,000)\n",
    "- **`model`** (optional): Model to use for summarization (defaults to the main model)\n",
    "- **`summary_prompt`** (optional): Custom prompt for generating summaries\n",
    "\n",
    "For this customer service workflow, we'll use a **5,000 token threshold**. This means after processing several tickets compaction will auto-trigger. This allows Claude to:\n",
    "1. **Keep completion summaries** (tickets resolved, categories, outcomes)\n",
    "2. **Discard detailed tool results** (full KB articles, complete classifications, drafted response text)\n",
    "3. **Start fresh** when processing the next batch of tickets\n",
    "\n",
    "This mimics how a real support agent works: resolve the ticket, document it briefly, move to the next case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "x6lnx8d20fr",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Turn  1: Input=  1,537 tokens | Output=   57 tokens | Messages= 1 | Cumulative In=   1,537\n",
      "Turn  2: Input=  1,755 tokens | Output=  108 tokens | Messages= 3 | Cumulative In=   3,292\n",
      "Turn  3: Input=  1,906 tokens | Output=   88 tokens | Messages= 5 | Cumulative In=   5,198\n",
      "Turn  4: Input=  2,216 tokens | Output=   84 tokens | Messages= 7 | Cumulative In=   7,414\n",
      "Turn  5: Input=  2,364 tokens | Output=   89 tokens | Messages= 9 | Cumulative In=   9,778\n",
      "Turn  6: Input=  2,516 tokens | Output=  332 tokens | Messages=11 | Cumulative In=  12,294\n",
      "Turn  7: Input=  2,898 tokens | Output=   67 tokens | Messages=13 | Cumulative In=  15,192\n",
      "Turn  8: Input=  3,090 tokens | Output=   56 tokens | Messages=15 | Cumulative In=  18,282\n",
      "Turn  9: Input=  3,325 tokens | Output=   97 tokens | Messages=17 | Cumulative In=  21,607\n",
      "Turn 10: Input=  3,465 tokens | Output=   90 tokens | Messages=19 | Cumulative In=  25,072\n",
      "Turn 11: Input=  3,801 tokens | Output=   84 tokens | Messages=21 | Cumulative In=  28,873\n",
      "Turn 12: Input=  3,949 tokens | Output=   89 tokens | Messages=23 | Cumulative In=  32,822\n",
      "Turn 13: Input=  4,101 tokens | Output=  368 tokens | Messages=25 | Cumulative In=  36,923\n",
      "Turn 14: Input=  4,519 tokens | Output=   67 tokens | Messages=27 | Cumulative In=  41,442\n",
      "Turn 15: Input=  4,711 tokens | Output=   57 tokens | Messages=29 | Cumulative In=  46,153\n",
      "Turn 16: Input=  4,934 tokens | Output=   97 tokens | Messages=31 | Cumulative In=  51,087\n",
      "\n",
      "============================================================\n",
      "🔄 Compaction occurred! Messages: 31 → 1\n",
      "   Summary message after compaction:\n",
      "<summary>\n",
      "## Support Ticket Processing Progress Summary\n",
      "\n",
      "### Task Overview\n",
      "Processing 5 support tickets sequentially, completing all 7 steps for each ticket (fetch, classify, research, prioritize, route, draft, complete).\n",
      "\n",
      "### Tickets Completed (2 of 5)\n",
      "\n",
      "**TICKET-1 (Chris Davis) - COMPLETED**\n",
      "- Issue: Account locked, unlock email link not working\n",
      "- Category: account\n",
      "- Priority: high\n",
      "- Team: account-services\n",
      "- Status: resolved\n",
      "- Response: Provided guidance on checking spam folder, link expiration (1 hour), and requesting new unlock link\n",
      "\n",
      "**TICKET-2 (Chris Williams) - COMPLETED**\n",
      "- Issue: Unrecognized $49.99 charge on 2025-10-30\n",
      "- Category: billing\n",
      "- Priority: high\n",
      "- Team: billing-team\n",
      "- Status: resolved\n",
      "- Response: Explained billing cycles, subscription possibility, and refund policy (5-7 business days, pro-rated for annual plans)\n",
      "\n",
      "### Current Status\n",
      "**TICKET-3 (John Jones) - IN PROGRESS**\n",
      "- Issue: Asking about Google Sheets integration for project management\n",
      "- Category: product\n",
      "- Priority: NOT YET SET\n",
      "- Team: NOT YET ASSIGNED\n",
      "- Steps completed: 1 (fetch), 2 (classify)\n",
      "- Steps remaining: 3 (research KB), 4 (set priority), 5 (route), 6 (draft), 7 (mark complete)\n",
      "\n",
      "### Next Steps\n",
      "1. Complete TICKET-3: Search knowledge base for product integration info\n",
      "2. Set priority (likely low/medium for feature inquiry)\n",
      "3. Route to product-team\n",
      "4. Draft response about integrations\n",
      "5. Mark complete\n",
      "6. Fetch and process TICKET-4\n",
      "7. Fetch and process TICKET-5\n",
      "\n",
      "### Key Knowledge Base Info Learned\n",
      "- Account: Password reset links expire in 1 hour, sent from noreply@support.example.com\n",
      "- Billing: Refunds take 5-7 business days, pro-rated for annual plans, billing on same date monthly/yearly\n",
      "\n",
      "### Remaining Work\n",
      "3 tickets left to process (TICKET-3 currently in progress, then TICKET-4 and TICKET-5)\n",
      "</summary>\n",
      "\n",
      "============================================================\n",
      "Turn 17: Input=  1,774 tokens | Output=   94 tokens | Messages= 1 | Cumulative In=  52,861\n",
      "Turn 18: Input=  1,906 tokens | Output=   95 tokens | Messages= 3 | Cumulative In=  54,767\n",
      "Turn 19: Input=  2,365 tokens | Output=  431 tokens | Messages= 5 | Cumulative In=  57,132\n",
      "Turn 20: Input=  3,164 tokens | Output=   60 tokens | Messages= 7 | Cumulative In=  60,296\n",
      "Turn 21: Input=  3,383 tokens | Output=  160 tokens | Messages= 9 | Cumulative In=  63,679\n",
      "Turn 22: Input=  3,872 tokens | Output=  447 tokens | Messages=11 | Cumulative In=  67,551\n",
      "Turn 23: Input=  4,687 tokens | Output=   64 tokens | Messages=13 | Cumulative In=  72,238\n",
      "Turn 24: Input=  4,914 tokens | Output=  160 tokens | Messages=15 | Cumulative In=  77,152\n",
      "\n",
      "============================================================\n",
      "🔄 Compaction occurred! Messages: 15 → 1\n",
      "   Summary message after compaction:\n",
      "<summary>\n",
      "## Support Ticket Processing Progress Summary\n",
      "\n",
      "### Task Overview\n",
      "Processing 5 support tickets sequentially, completing all 7 steps for each ticket (fetch, classify, research, prioritize, route, draft, complete).\n",
      "\n",
      "### Tickets Completed (4 of 5)\n",
      "\n",
      "**TICKET-1 (Chris Davis) - COMPLETED**\n",
      "- Issue: Account locked, unlock email link not working\n",
      "- Category: account\n",
      "- Priority: high\n",
      "- Team: account-services\n",
      "- Status: resolved\n",
      "- Response: Provided guidance on checking spam folder, link expiration (1 hour), and requesting new unlock link\n",
      "\n",
      "**TICKET-2 (Chris Williams) - COMPLETED**\n",
      "- Issue: Unrecognized $49.99 charge on 2025-10-30\n",
      "- Category: billing\n",
      "- Priority: high\n",
      "- Team: billing-team\n",
      "- Status: resolved\n",
      "- Response: Explained billing cycles, subscription possibility, and refund policy (5-7 business days, pro-rated for annual plans)\n",
      "\n",
      "**TICKET-3 (John Jones) - COMPLETED**\n",
      "- Issue: Asking about Google Sheets integration for project management\n",
      "- Category: product\n",
      "- Priority: medium\n",
      "- Team: product-success\n",
      "- Status: resolved\n",
      "- Response: Explained that Product Success team will provide details on integration options, API access, and current/planned features\n",
      "\n",
      "**TICKET-4 (Sam Johnson) - COMPLETED**\n",
      "- Issue: Wants to know differences between Standard and Premium plans, specifically \"advanced analytics\"\n",
      "- Category: product\n",
      "- Priority: low\n",
      "- Team: product-success\n",
      "- Status: resolved\n",
      "- Response: Explained that Product Success team will provide detailed plan comparison and feature breakdown\n",
      "\n",
      "### Current Status\n",
      "**TICKET-5 (Morgan Brown) - IN PROGRESS**\n",
      "- Issue: Damaged package (Order #ORD-43312), broken product inside, needs replacement\n",
      "- Category: shipping (classified)\n",
      "- Priority: NOT YET SET\n",
      "- Team: NOT YET ASSIGNED\n",
      "- Steps completed: 1 (fetch), 2 (classify), 3 (research KB - no shipping info found)\n",
      "- Steps remaining: 4 (set priority), 5 (route), 6 (draft), 7 (mark complete)\n",
      "\n",
      "### Next Steps for TICKET-5\n",
      "1. Set priority (likely HIGH - damaged/broken product requiring replacement)\n",
      "2. Route to appropriate team (likely fulfillment, operations, or customer-service team)\n",
      "3. Draft response addressing damaged shipment, replacement process, and next steps\n",
      "4. Mark complete\n",
      "5. **ALL TICKETS WILL BE COMPLETE**\n",
      "\n",
      "### Key Knowledge Base Info Learned\n",
      "- **Account**: Password reset links expire in 1 hour, sent from noreply@support.example.com\n",
      "- **Billing**: Refunds take 5-7 business days, pro-rated for annual plans, billing on same date monthly/yearly; accepts Visa, Mastercard, Amex, PayPal\n",
      "- **Technical**: Max upload 100MB, supported formats: PDF, DOCX, PNG, JPG, CSV; system requirements: 4GB RAM, modern browsers\n",
      "- **Product category**: Does not exist in KB (only billing, technical, account available)\n",
      "- **Shipping info**: Not found in knowledge base\n",
      "\n",
      "### Team Routing Patterns Observed\n",
      "- account-services: Account access issues\n",
      "- billing-team: Billing/payment inquiries\n",
      "- product-success: Product features, integrations, plan comparisons\n",
      "\n",
      "### Remaining Work\n",
      "1 ticket left to complete (TICKET-5 - final ticket, currently in progress at step 3 of 7)\n",
      "</summary>\n",
      "\n",
      "============================================================\n",
      "Turn 25: Input=  2,077 tokens | Output=  496 tokens | Messages= 1 | Cumulative In=  79,229\n",
      "Turn 26: Input=  2,942 tokens | Output=  438 tokens | Messages= 3 | Cumulative In=  82,171\n",
      "\n",
      "============================================================\n",
      "OPTIMIZED RESULTS (WITH COMPACTION)\n",
      "============================================================\n",
      "Total turns:   26\n",
      "Compactions:   2\n",
      "Input tokens:  82,171\n",
      "Output tokens: 4,275\n",
      "Total tokens:  86,446\n",
      "============================================================\n"
     ]
    }
   ],
   "source": [
    "# Re-initialize queue and run with compaction\n",
    "initialize_ticket_queue(num_tickets)\n",
    "\n",
    "total_input_compact = 0\n",
    "total_output_compact = 0\n",
    "turn_count_compact = 0\n",
    "compaction_count = 0\n",
    "prev_msg_count = 0\n",
    "\n",
    "runner = client.beta.messages.tool_runner(\n",
    "    model=MODEL,\n",
    "    max_tokens=4096,\n",
    "    tools=tools,\n",
    "    messages=messages,\n",
    "    compaction_control={\n",
    "        \"enabled\": True,\n",
    "        \"context_token_threshold\": 5000,\n",
    "    },\n",
    ")\n",
    "\n",
    "for message in runner:\n",
    "    turn_count_compact += 1\n",
    "    total_input_compact += message.usage.input_tokens\n",
    "    total_output_compact += message.usage.output_tokens\n",
    "    messages_list = list(runner._params[\"messages\"])\n",
    "    curr_msg_count = len(messages_list)\n",
    "\n",
    "    if curr_msg_count < prev_msg_count:\n",
    "        # We can identify compaction when the message count decreases\n",
    "        compaction_count += 1\n",
    "\n",
    "        print(f\"\\n{'=' * 60}\")\n",
    "        print(f\"🔄 Compaction occurred! Messages: {prev_msg_count} → {curr_msg_count}\")\n",
    "        print(\"   Summary message after compaction:\")\n",
    "        print(messages_list[-1][\"content\"][-1].text)  # type: ignore\n",
    "        print(f\"\\n{'=' * 60}\")\n",
    "\n",
    "    prev_msg_count = curr_msg_count\n",
    "    print(\n",
    "        f\"Turn {turn_count_compact:2d}: Input={message.usage.input_tokens:7,} tokens | \"\n",
    "        f\"Output={message.usage.output_tokens:5,} tokens | \"\n",
    "        f\"Messages={len(messages_list):2d} | \"\n",
    "        f\"Cumulative In={total_input_compact:8,}\"\n",
    "    )\n",
    "\n",
    "print(f\"\\n{'=' * 60}\")\n",
    "print(\"OPTIMIZED RESULTS (WITH COMPACTION)\")\n",
    "print(f\"{'=' * 60}\")\n",
    "print(f\"Total turns:   {turn_count_compact}\")\n",
    "print(f\"Compactions:   {compaction_count}\")\n",
    "print(f\"Input tokens:  {total_input_compact:,}\")\n",
    "print(f\"Output tokens: {total_output_compact:,}\")\n",
    "print(f\"Total tokens:  {total_input_compact + total_output_compact:,}\")\n",
    "print(f\"{'=' * 60}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5cbbfe32",
   "metadata": {},
   "source": [
    "With automatic context compaction enabled, we can see that our token usage per turn does not grow linearly, but is reduced after each compaction event. There were two compaction events during the processing of tickets, and the follow turn shows a reduction in total token usage.\n",
    "\n",
    "Compared to the baseline version, we only used 79,000 tokens. We've also printed out the summary messages generated after each compaction event, showing how Claude effectively condensed prior ticket details into summaries.\n",
    "\n",
    "Let's look at the final response after processing all 5 tickets with compaction enabled."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "24dd5c7c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Perfect! **ALL 5 TICKETS HAVE BEEN SUCCESSFULLY COMPLETED!** 🎉\n",
      "\n",
      "## Final Summary - All Tickets Processed\n",
      "\n",
      "### TICKET-5 (Morgan Brown) - **COMPLETED** ✓\n",
      "- **Issue**: Damaged package (Order #ORD-43312), broken product inside, needs replacement\n",
      "- **Category**: shipping\n",
      "- **Priority**: high\n",
      "- **Team**: logistics-team\n",
      "- **Status**: resolved\n",
      "- **Response**: Apologized for damaged shipment, escalated to Logistics Team with HIGH priority, explained they'll process immediate replacement, provide return instructions, and contact customer with tracking and timeline\n",
      "\n",
      "---\n",
      "\n",
      "## 🎯 ALL 5 TICKETS COMPLETED\n",
      "\n",
      "1. ✅ **TICKET-1** (Chris Davis) - Account locked → account-services\n",
      "2. ✅ **TICKET-2** (Chris Williams) - Billing charge → billing-team  \n",
      "3. ✅ **TICKET-3** (John Jones) - Google Sheets integration → product-success\n",
      "4. ✅ **TICKET-4** (Sam Johnson) - Plan comparison → product-success\n",
      "5. ✅ **TICKET-5** (Morgan Brown) - Damaged shipment → logistics-team\n",
      "\n",
      "### Processing Statistics\n",
      "- **Total tickets processed**: 5 of 5 (100%)\n",
      "- **Steps per ticket**: 7 (fetch, classify, research, prioritize, route, draft, complete)\n",
      "- **Total operations**: 35 successful operations\n",
      "- **Categories used**: account, billing, product (2x), shipping\n",
      "- **Teams utilized**: account-services, billing-team, product-success (2x), logistics-team\n",
      "- **Priority distribution**: 2 high, 2 medium, 1 low\n",
      "\n",
      "All tickets have been properly classified, prioritized, routed to the appropriate teams, and have draft responses ready for team review! 🎊\n"
     ]
    }
   ],
   "source": [
    "print(message.content[-1].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "vb4cesnmb8",
   "metadata": {},
   "source": [
    "### Comparing Results\n",
    "\n",
    "With compaction enabled, we can see a clear differece between the two runs in token savings, while preserving the quality of the workflow and final summary.\n",
    "\n",
    "Here's what changed with automatic context compaction:\n",
    "\n",
    "1. **Context resets after several tickets** - When processing 5-7 tickets generates 5k+ tokens of tool results, the SDK automatically:\n",
    "   - Injects a summary prompt\n",
    "   - Has Claude generate a completion summary wrapped in `<summary></summary>` tags\n",
    "   - Clears the conversation history and discards detailed classifications, KB searches, and responses\n",
    "   - Continues with only the completion summary\n",
    "\n",
    "2. **Input tokens stay bounded** - Instead of accumulating to 100k+ as we process more tickets, input tokens reset after each compaction. When processing Ticket #5, we're NOT carrying the full tool results from Tickets #1-4.\n",
    "\n",
    "3. **Task completes successfully** - The workflow continues smoothly through all tickets without hitting context limits\n",
    "\n",
    "4. **Quality is preserved** - The summaries retain critical information:\n",
    "   - Tickets processed with their IDs\n",
    "   - Categories and priorities assigned\n",
    "   - Teams routed to\n",
    "   - Overall progress status\n",
    "   \n",
    "   All tickets are still properly classified, prioritized, routed, and responded to.\n",
    "\n",
    "5. **Natural workflow** - This mirrors how real support agents work: resolve a ticket, document it briefly in the system, close it, move to the next one. You don't keep every knowledge base article and full response draft open while working on new tickets.\n",
    "\n",
    "Let's visualize the token savings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "z9lvigc94p",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "======================================================================\n",
      "TOKEN USAGE COMPARISON\n",
      "======================================================================\n",
      "Metric                         Baseline             With Compaction     \n",
      "----------------------------------------------------------------------\n",
      "Input tokens:                              204,416              82,171\n",
      "Output tokens:                               4,422               4,275\n",
      "Total tokens:                              208,838              86,446\n",
      "Compactions:                                   N/A                   2\n",
      "======================================================================\n",
      "\n",
      "💰 Token Savings: 122,392 tokens (58.6% reduction)\n"
     ]
    }
   ],
   "source": [
    "# Compare baseline vs compaction\n",
    "print(\"=\" * 70)\n",
    "print(\"TOKEN USAGE COMPARISON\")\n",
    "print(\"=\" * 70)\n",
    "print(f\"{'Metric':<30} {'Baseline':<20} {'With Compaction':<20}\")\n",
    "print(\"-\" * 70)\n",
    "print(f\"{'Input tokens:':<30} {total_input:>19,} {total_input_compact:>19,}\")\n",
    "print(f\"{'Output tokens:':<30} {total_output:>19,} {total_output_compact:>19,}\")\n",
    "print(\n",
    "    f\"{'Total tokens:':<30} {total_input + total_output:>19,} {total_input_compact + total_output_compact:>19,}\"\n",
    ")\n",
    "print(f\"{'Compactions:':<30} {'N/A':>19} {compaction_count:>19}\")\n",
    "print(\"=\" * 70)\n",
    "\n",
    "# Calculate savings\n",
    "token_savings = (total_input + total_output) - (total_input_compact + total_output_compact)\n",
    "savings_percent = (\n",
    "    (token_savings / (total_input + total_output)) * 100 if (total_input + total_output) > 0 else 0\n",
    ")\n",
    "\n",
    "print(f\"\\n💰 Token Savings: {token_savings:,} tokens ({savings_percent:.1f}% reduction)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "lzvf1mw7o6",
   "metadata": {},
   "source": [
    "## How Compaction Works Under the Hood\n",
    "\n",
    "When the `tool_runner` detects that token usage has exceeded the threshold, it automatically:\n",
    "\n",
    "1. **Pauses the workflow** before making the next API call\n",
    "2. **Injects a summary request** as a user message asking Claude to summarize progress\n",
    "3. **Generates a summary** - Claude produces a summary wrapped in `<summary></summary>` tags containing:\n",
    "   - **Completed tickets**: Brief records of tickets resolved (IDs, categories, priorities, outcomes)\n",
    "   - **Progress status**: How many tickets processed, how many remain\n",
    "   - **Key patterns**: Any notable trends across tickets\n",
    "   - **Next steps**: What to do next (continue processing remaining tickets)\n",
    "4. **Clears history** - The entire conversation history (including all tool results) is replaced with just the summary\n",
    "5. **Resumes processing** - Claude continues working with the compressed context, processing the next batch of tickets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "v64ljd0a79",
   "metadata": {},
   "source": [
    "## Customizing Compaction Configuration\n",
    "\n",
    "You can customize how compaction works to fit your specific use case. Here are the key configuration options:\n",
    "\n",
    "### Adjusting the Threshold\n",
    "\n",
    "The `context_token_threshold` determines when compaction triggers:\n",
    "\n",
    "```python\n",
    "compaction_control={\n",
    "    \"enabled\": True,\n",
    "    \"context_token_threshold\": 5000,  # Compact after processing 5-7 tickets\n",
    "}\n",
    "```\n",
    "\n",
    "The threshold should not be set too low, otherwise the summary itself could trigger a compaction. We set a threshold of 5,000 tokens for demonstration purposes, but in practice, experiment with different settings to find what works best for your workflow.\n",
    "\n",
    "Here some general guidelines:\n",
    "\n",
    "- **Low thresholds (5k-20k)**: \n",
    "  - Use for iterative task processing with clear boundaries\n",
    "  - More frequent compaction, minimal context accumulation\n",
    "  - Best for sequential entity processing\n",
    "  \n",
    "- **Medium thresholds (50k-100k)**: \n",
    "  - Multi-phase workflows with fewer, larger natural checkpoints\n",
    "  - Balance between context retention and management\n",
    "  - Suitable for workflows with expensive tool calls\n",
    "  \n",
    "- **High thresholds (100k-150k)**: \n",
    "  - Tasks requiring substantial historical context\n",
    "  - Less frequent compaction preserves more raw details\n",
    "  - Higher per-call costs but fewer compactions\n",
    "  \n",
    "- **Default (100k)**: Good balance for general long-running tasks\n",
    "\n",
    "**For ticket processing**: The 5k threshold works well because each ticket's workflow generates substantial tool results, but tickets are independent. After resolving Ticket A, you don't need its detailed KB searches when processing Ticket B.\n",
    "\n",
    "### Using a Different Model for Summarization\n",
    "\n",
    "You can also use a faster/cheaper model for generating summaries:\n",
    "\n",
    "```python\n",
    "compaction_control={\n",
    "    \"enabled\": True,\n",
    "    \"model\": \"claude-haiku-4-5\",  # Use Haiku for cost-effective summaries\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "w4oyvorzkn",
   "metadata": {},
   "source": [
    "### Custom Summary Prompts\n",
    "\n",
    "You can provide a custom prompt to guide how summaries are generated. This is especially useful for customer service workflows where you need to preserve specific types of information.\n",
    "\n",
    "For example, we could define a custom prompt based on our requirements:\n",
    "- **Ticket summaries** for all completed tickets\n",
    "- **Categories and priorities** assigned\n",
    "- **Teams routed to**\n",
    "- **Progress status** (tickets completed, tickets remaining)\n",
    "- **Next steps** in the workflow\n",
    "\n",
    "```python\n",
    "compaction_control={\n",
    "    \"enabled\": True,\n",
    "    \"summary_prompt\": \"\"\"You are processing customer support tickets from a queue.\n",
    "\n",
    "Create a focused summary that preserves:\n",
    "\n",
    "1. **COMPLETED TICKETS**: For each ticket you've fully processed:\n",
    "   - Ticket ID and customer name\n",
    "   - Issue category and priority assigned\n",
    "   - Team routed to\n",
    "   - Brief outcome\n",
    "\n",
    "2. **PROGRESS STATUS**: \n",
    "   - How many tickets you've completed\n",
    "   - Approximately how many remain in the queue\n",
    "\n",
    "3. **NEXT STEPS**: Continue processing the next ticket\n",
    "\n",
    "Format with clear sections and wrap in <summary></summary> tags.\"\"\"\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cwipn73ib9",
   "metadata": {},
   "source": [
    "## Compaction Without Tools: Simple Chat Loop\n",
    "\n",
    "While the examples above focus on tool-heavy agentic workflows, context compaction is also valuable for **simple conversational applications** where users drive the conversation.\n",
    "\n",
    " **Note:** The `compaction_control` parameter demonstrated above works with `tool_runner` for agentic workflows with tools. For simple chat applications without tools, you'll implement compaction manually using the same principles.\n",
    "\n",
    "Consider a chat application where users are having extended conversations with Claude—discussing complex topics, iterating on ideas, or working through problems. As the conversation grows, you face the same context accumulation challenges.\n",
    "\n",
    "**The Difference**: Instead of tool use triggering token growth, it's the back-and-forth conversation itself. Each exchange adds messages to the history:\n",
    "- User asks a question\n",
    "- Claude provides a detailed response\n",
    "- User asks for clarification or elaboration\n",
    "- Claude responds with more context\n",
    "- This repeats dozens or hundreds of times\n",
    "\n",
    "Without compaction, by turn 50 you're sending the entire conversation history (all 50 exchanges) on every API call.\n",
    "\n",
    "**The Solution**: Implement compaction manually in your chat loop using the same pattern:\n",
    "1. Track token usage after each turn\n",
    "2. When threshold is exceeded, request a summary\n",
    "3. Replace conversation history with the summary\n",
    "4. Continue the conversation with compressed context\n",
    "\n",
    "Let's see how to implement this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "m6akcmnsz09",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Chat with Claude (type 'quit' to exit, or just hit Enter to continue)\n",
      "This is a demonstration - try having a conversation and watch compaction trigger\n",
      "============================================================\n",
      "\n",
      "You: Help me understand how Python decorators work\n"
     ]
    }
   ],
   "source": [
    "#!/usr/bin/env python3\n",
    "\"\"\"\n",
    "Simple Compaction Example - User-Driven Chat Loop\n",
    "\n",
    "This shows the basic pattern for a chat application with compaction.\n",
    "No tools required - just a simple loop where the user drives continuation.\n",
    "\"\"\"\n",
    "\n",
    "# Configuration\n",
    "COMPACTION_THRESHOLD = 3000  # Compact when tokens exceed this (low for demo purposes)\n",
    "\n",
    "# Structured summarization prompt for compaction\n",
    "SUMMARY_PROMPT = \"\"\"You have been working on the task described above but have not yet completed it. Write a continuation summary that will allow you (or another instance of yourself) to resume work efficiently in a future context window where the conversation history will be replaced with this summary. Your summary should be structured, concise, and actionable. Include:\n",
    "\n",
    "1. **Task Overview**\n",
    "   - The user's core request and success criteria\n",
    "   - Any clarifications or constraints they specified\n",
    "\n",
    "2. **Current State**\n",
    "   - What has been completed so far\n",
    "   - Files created, modified, or analyzed (with paths if relevant)\n",
    "   - Key outputs or artifacts produced\n",
    "\n",
    "3. **Important Discoveries**\n",
    "   - Technical constraints or requirements uncovered\n",
    "   - Decisions made and their rationale\n",
    "   - Errors encountered and how they were resolved\n",
    "   - What approaches were tried that didn't work (and why)\n",
    "\n",
    "4. **Next Steps**\n",
    "   - Specific actions needed to complete the task\n",
    "   - Any blockers or open questions to resolve\n",
    "   - Priority order if multiple steps remain\n",
    "\n",
    "5. **Context to Preserve**\n",
    "   - User preferences or style requirements\n",
    "   - Domain-specific details that aren't obvious\n",
    "   - Any promises made to the user\n",
    "\n",
    "Be concise but complete—err on the side of including information that would prevent duplicate work or repeated mistakes.\n",
    " Write in a way that enables immediate resumption of the task.\n",
    "\n",
    "Wrap your summary in <summary></summary> tags.\"\"\"\n",
    "\n",
    "# Message history\n",
    "messages = []\n",
    "\n",
    "print(\"Chat with Claude (type 'quit' to exit, or just hit Enter to continue)\")\n",
    "print(\"This is a demonstration - try having a conversation and watch compaction trigger\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "# Simulate a conversation for demo purposes\n",
    "demo_messages = [\n",
    "    \"Help me understand how Python decorators work\",\n",
    "    \"Can you show me an example with a timing decorator?\",\n",
    "    \"How would I make a decorator that takes arguments?\",\n",
    "]\n",
    "\n",
    "for user_input in demo_messages:\n",
    "    print(f\"\\nYou: {user_input}\")\n",
    "\n",
    "    # Add user message\n",
    "    messages.append({\"role\": \"user\", \"content\": user_input})\n",
    "\n",
    "    # Get Claude's response\n",
    "    response = client.messages.create(\n",
    "        model=MODEL,\n",
    "        max_tokens=2048,\n",
    "        messages=messages,\n",
    "    )\n",
    "\n",
    "    messages.append(\n",
    "        {\n",
    "            \"role\": \"assistant\",\n",
    "            \"content\": response.content,\n",
    "        }\n",
    "    )\n",
    "\n",
    "    print(\"\\nClaude: \", end=\"\")\n",
    "    for block in response.content:\n",
    "        if block.type == \"text\":\n",
    "            print(f\"{block.text[:300]} ...\")\n",
    "\n",
    "    # Check if we should compact\n",
    "    usage = response.usage\n",
    "\n",
    "    # Calculate total tokens (includes cache tokens)\n",
    "    total_input_tokens = (\n",
    "        usage.input_tokens\n",
    "        + (usage.cache_creation_input_tokens or 0)\n",
    "        + (usage.cache_read_input_tokens or 0)\n",
    "    )\n",
    "    total_tokens = total_input_tokens + usage.output_tokens\n",
    "\n",
    "    cache_info = \"\"\n",
    "    if usage.cache_creation_input_tokens or usage.cache_read_input_tokens:\n",
    "        cache_info = f\" (cache: {usage.cache_creation_input_tokens or 0} write + {usage.cache_read_input_tokens or 0} read)\"\n",
    "\n",
    "    print(\n",
    "        f\"\\n[Tokens: {total_input_tokens} in{cache_info} + {usage.output_tokens} out = {total_tokens} total]\"\n",
    "    )\n",
    "\n",
    "    if total_tokens > COMPACTION_THRESHOLD:\n",
    "        print(f\"\\n{'=' * 60}\")\n",
    "        print(f\"🔄 Compacting conversation... {len(messages)} messages → \", end=\"\", flush=True)\n",
    "\n",
    "        # Get summary using structured prompt\n",
    "        summary_response = client.messages.create(\n",
    "            model=MODEL,\n",
    "            max_tokens=4096,\n",
    "            messages=messages + [{\"role\": \"user\", \"content\": SUMMARY_PROMPT}],\n",
    "        )\n",
    "\n",
    "        summary_text = \"\".join(\n",
    "            block.text for block in summary_response.content if block.type == \"text\"\n",
    "        )\n",
    "\n",
    "        # Replace history with summary\n",
    "        messages = [{\"role\": \"user\", \"content\": summary_text}]\n",
    "\n",
    "        print(\"1 message\")\n",
    "        print(f\"{'=' * 60}\\n\")\n",
    "\n",
    "print(f\"Final conversation messages: {messages[-1].get('content')}\")\n",
    "\n",
    "print(\"\\nDemo complete! In a real application, this loop would continue with user input.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b229j75wdjm",
   "metadata": {},
   "source": [
    "### Understanding the Chat Loop Pattern\n",
    "\n",
    "The example above demonstrates manual compaction in a conversational context. Here's how it works:\n",
    "\n",
    "**Key Components**:\n",
    "\n",
    "1. **Token Tracking**: After each response, calculate total tokens (input + output + cache tokens)\n",
    "2. **Threshold Check**: When total exceeds threshold, trigger compaction\n",
    "3. **Summary Request**: Send the same structured SUMMARY_PROMPT to Claude\n",
    "4. **History Replacement**: Replace entire message history with just the summary\n",
    "5. **Continue**: Next user message builds on the summary, not full history\n",
    "\n",
    "**When to Use This Pattern**:\n",
    "\n",
    "- **Extended brainstorming sessions**: Users exploring ideas with Claude over many turns\n",
    "- **Learning conversations**: Tutorials or explanations that span dozens of exchanges\n",
    "- **Iterative refinement**: Users providing feedback on drafts, designs, or solutions\n",
    "- **Chat applications**: Any multi-turn conversation interface\n",
    "\n",
    "**Key Differences from Tool Runner**:\n",
    "\n",
    "| Aspect | Tool Runner (Automatic) | Chat Loop (Manual) |\n",
    "|--------|------------------------|-------------------|\n",
    "| **Trigger** | Automatic when threshold reached | You implement threshold check |\n",
    "| **Summary** | SDK handles summary request | You make explicit API call |\n",
    "| **History Management** | SDK replaces messages | You manually replace list |\n",
    "| **Use Case** | Agentic workflows with tools | User-driven conversations |\n",
    "\n",
    "**Production Considerations**:\n",
    "\n",
    "1. **Adjust threshold**: Use larger thresholds for real applications\n",
    "2. **Customize summary prompt**: Tailor to your conversation type (brainstorming vs. technical support vs. tutoring)\n",
    "3. **Show user indicators**: Display a message like \"Summarizing conversation...\" so users understand the pause\n",
    "4. **Preserve key context**: Ensure the summary prompt captures domain-specific information your users care about\n",
    "\n",
    "This pattern gives you full control over when and how compaction happens, making it ideal for conversational applications where the SDK's automatic tool-runner compaction isn't available."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d71dwo1dayp",
   "metadata": {},
   "source": [
    "## Limitations and Considerations\n",
    "\n",
    "While automatic context compaction is powerful, there are important limitations to understand:\n",
    "\n",
    "### Server-Side Sampling Loops\n",
    "\n",
    "**Current Limitation**: Compaction does not work optimally with server-side sampling loops, such as server-side web search tools.\n",
    "\n",
    "**Why**: Cache tokens accumulate across sampling loops, which can trigger compaction prematurely based on cached content rather than actual conversation history.\n",
    "\n",
    "This feature works best with:\n",
    "- ✅ Client-side tools (like the customer service API in this cookbook)\n",
    "- ✅ Standard agentic workflows with regular tool use\n",
    "- ✅ File operations, database queries, API calls\n",
    "- ❌ Server-side Extended Thinking\n",
    "- ❌ Server-side web search tools\n",
    "\n",
    "### Information Loss\n",
    "\n",
    "**Trade-off**: Summaries inherently lose some information. While Claude is good at identifying key points, some details will be compressed or omitted.\n",
    "\n",
    "**In ticket processing**: \n",
    "- ✅ **Retained**: Ticket IDs, categories, priorities, teams, outcomes, progress status\n",
    "- ❌ **Lost**: Full knowledge base article text, complete drafted response text, detailed classification reasoning\n",
    "\n",
    "This is usually acceptable, you don't need every KB article and full response text in perpetuity, just the completion records.\n",
    "\n",
    "**Mitigation**:\n",
    "- Use custom summary prompts to preserve critical information\n",
    "- Set higher thresholds for tasks requiring extensive historical context\n",
    "- Structure your tasks to be modular (each phase builds on summaries, not raw details)\n",
    "\n",
    "### When NOT to Use Compaction\n",
    "\n",
    "Avoid compaction for:\n",
    "\n",
    "1. **Short tasks**: If your task completes within 50k-100k tokens, compaction adds unnecessary overhead\n",
    "2. **Tasks requiring full audit trails**: Some tasks need access to ALL previous details\n",
    "3. **Server-side sampling workflows**: As mentioned above, wait for this limitation to be addressed\n",
    "4. **Highly iterative refinement**: Tasks where each step critically depends on exact details from all previous steps\n",
    "\n",
    "### When TO Use Compaction\n",
    "\n",
    "Compaction is ideal for:\n",
    "\n",
    "1. **Sequential processing**: Like our ticket workflow—process multiple items one after another\n",
    "2. **Multi-phase workflows**: Where each phase can summarize progress before moving on\n",
    "3. **Iterative data processing**: Processing large datasets in chunks or entities one at a time\n",
    "4. **Extended analysis sessions**: Analyzing data across many entities\n",
    "5. **Batch operations**: Processing hundreds of items where each is independent\n",
    "\n",
    "**Ticket processing is a perfect use case** because:\n",
    "- Each ticket workflow is largely independent\n",
    "- You need completion summaries, not full tool results\n",
    "- Natural compaction points exist (after completing several tickets)\n",
    "- The workflow is iterative and sequential"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4pz1jmdidi",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "Automatic context compaction is a powerful feature that enables long-running agentic workflows to exceed typical context limits. In this cookbook, we've explored compaction through a customer service ticket processing workflow.\n",
    "\n",
    "### Next Steps\n",
    "\n",
    "Try implementing compaction in your own workflows:\n",
    "1. Identify natural compaction points (after processing each item, completing each phase, etc.)\n",
    "2. Start with an aggressive threshold (5k-10k) if you have clear per-item boundaries\n",
    "3. Use custom summary prompts to preserve critical information\n",
    "4. Monitor when compaction triggers and verify quality is maintained\n",
    "5. Adjust threshold based on your specific needs\n",
    "\n",
    "For more on effective context management, see [Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "anthropic-cookbook (3.12.12)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
